Goto

Collaborating Authors

 separability assumption



Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper reduces a broad class of machine learning problems involving latent variables to the problem of finding anchors defining the conical hull of the data (via the method of moments). In addition, it proposes a new divide-and-conquer algorithm based on random projections to speed up the search for the anchors. Overall, I found this an interesting paper presenting significant contributions. However the presentation could be greatly improved as it lacks clarity here and there. It looks like this paper was squeezed in a hurry to fit the 8-page limit.



Divide-and-Conquer Learning by Anchoring a Conical Hull

Neural Information Processing Systems

We reduce a broad class of fundamental machine learning problems, usually addressed by EM or sampling, to the problem of finding the k extreme rays spanning the conical hull of a1 data point set. These k "anchors" lead to a global solution and a more interpretable model that can even outperform EM and sampling on generalization error. To find the k anchors, we propose a novel divide-andconquer learning scheme "DCA" that distributes the problem to O(k log k) sametype sub-problems on different low-D random hyperplanes, each can be solved independently by any existing solver. For the 2D sub-problem, we instead present a non-iterative solver that only needs to compute an array of cosine values and its max/min entries. DCA also provides a faster subroutine inside other algorithms to check whether a point is covered in a conical hull, and thus improves these algorithms by providing significant speedups. We apply our method to GMM, HMM, LDA, NMF and subspace clustering, then show its competitive performance and scalability over other methods on large datasets.


Review for NeurIPS paper: Recovery of sparse linear classifiers from mixture of responses

Neural Information Processing Systems

Summary and Contributions: This work initiates the study of the following generalization of 1-bit compressed sensing. There are some unknown k-sparse vectors w_1,...,w_{ell} in R d, and one can query any vector v and get back sgn( v,w_i) for random index i. The goal is to recover the w_i's while minimizing the number of queries. This problem should not be confused with the problem of learning mixtures of halfspaces in the sense of distribution learning, as here the learner gets to pick the design vectors. A similar model in the context of regression has been studied before by Krishnamurthy et al. and Yin et al., as the authors acknowledge.


Causal Representation Learning with Generative Artificial Intelligence: Application to Texts as Treatments

arXiv.org Artificial Intelligence

In this paper, we demonstrate how to enhance the validity of causal inference with unstructured high-dimensional treatments like texts, by leveraging the power of generative Artificial Intelligence. Specifically, we propose to use a deep generative model such as large language models (LLMs) to efficiently generate treatments and use their internal representation for subsequent causal effect estimation. We show that the knowledge of this true internal representation helps disentangle the treatment features of interest, such as specific sentiments and certain topics, from other possibly unknown confounding features. Unlike the existing methods, our proposed approach eliminates the need to learn causal representation from the data and hence produces more accurate and efficient estimates. We formally establish the conditions required for the nonparametric identification of the average treatment effect, propose an estimation strategy that avoids the violation of the overlap assumption, and derive the asymptotic properties of the proposed estimator through the application of double machine learning. Finally, using an instrumental variables approach, we extend the proposed methodology to the settings, in which the treatment feature is based on human perception rather than is assumed to be fixed given the treatment object. The proposed methodology is also applicable to text reuse where an LLM is used to regenerate the existing texts. We conduct simulation and empirical studies, using the generated text data from an open-source LLM, Llama 3, to illustrate the advantages of our estimator over the state-of-the-art causal representation learning algorithms.


Divide-and-Conquer Learning by Anchoring a Conical Hull Tianyi Zhou

Neural Information Processing Systems

We reduce a broad class of fundamental machine learning problems, usually addressed by EM or sampling, to the problem of finding the k extreme rays spanning the conical hull of a1 data point set. These k "anchors" lead to a global solution and a more interpretable model that can even outperform EM and sampling on generalization error. To find the k anchors, we propose a novel divide-andconquer learning scheme "DCA" that distributes the problem to O(k log k) sametype sub-problems on different low-D random hyperplanes, each can be solved independently by any existing solver. For the 2D sub-problem, we instead present a non-iterative solver that only needs to compute an array of cosine values and its max/min entries. DCA also provides a faster subroutine inside other algorithms to check whether a point is covered in a conical hull, and thus improves these algorithms by providing significant speedups. We apply our method to GMM, HMM, LDA, NMF and subspace clustering, then show its competitive performance and scalability over other methods on large datasets.


Robust Spectral Inference for Joint Stochastic Matrix Factorization David Mimno Dept. of Computer Science Dept. of Information Science Cornell University

Neural Information Processing Systems

Spectral inference provides fast algorithms and provable optimality for latent topic analysis. But for real data these algorithms require additional ad-hoc heuristics, and even then often produce unusable results. We explain this poor performance by casting the problem of topic inference in the framework of Joint Stochastic Matrix Factorization (JSMF) and showing that previous methods violate the theoretical conditions necessary for a good solution to exist. We then propose a novel rectification method that learns high quality topics and their interactions even on small, noisy data. This method achieves results comparable to probabilistic techniques in several domains while maintaining scalability and provable optimality.


Minimum Volume Topic Modeling

arXiv.org Machine Learning

We propose a new topic modeling procedure that takes advantage of the fact that the There are many extensions of LDA, including a nonparametric Latent Dirichlet Allocation (LDA) log likelihood extension based on the Dirichlet process function is asymptotically equivalent called Hierarchical Dirichlet Process (Teh et al., to the logarithm of the volume of the topic 2005), a correlated topic extension based on the logistic simplex. This allows topic modeling to be normal prior on the topic proportions (Lafferty reformulated as finding the probability simplex and Blei, 2006), and a time-varying topic modeling that minimizes its volume and encloses extension (Blei and Lafferty, 2006). There are the documents that are represented as distributions two main approaches for estimation of the parameters over words. A convex relaxation of probabilistic topic models: the variational of the minimum volume topic model optimization approximation popularized by Blei et al. (2003) and is proposed, and it is shown that the sampling based approach studied by Pritchard the relaxed problem has the same global et al. (2000).


The Informativeness of $k$-Means and Dimensionality Reduction for Learning Mixture Models

arXiv.org Machine Learning

The learning of mixture models can be viewed as a clustering problem. Indeed, given data samples independently generated from a mixture of distributions, we often would like to find the correct target clustering of the samples according to which component distribution they were generated from. For a clustering problem, practitioners often choose to use the simple k-means algorithm. k-means attempts to find an optimal clustering which minimizes the sum-of-squared distance between each point and its cluster center. In this paper, we provide sufficient conditions for the closeness of any optimal clustering and the correct target clustering assuming that the data samples are generated from a mixture of log-concave distributions. Moreover, we show that under similar or even weaker conditions on the mixture model, any optimal clustering for the samples with reduced dimensionality is also close to the correct target clustering. These results provide intuition for the informativeness of k-means (with and without dimensionality reduction) as an algorithm for learning mixture models. We verify the correctness of our theorems using numerical experiments and demonstrate using datasets with reduced dimensionality significant speed ups for the time required to perform clustering.